智能论文笔记

Adversarially trained neural representations may already be as robust as corresponding biological neural representations

Chong Guo , Michael J. Lee , Guillaume Leclerc , Joel Dapello , Yug Rao , Aleksander Madry , James J. DiCarlo

分类：机器学习

2022-06-19

灵长类动物的视觉系统是强大感知的黄金标准。因此，人们普遍认为，模仿这些系统基础的神经表现形式将产生具有对手稳健的人工视觉系统。在这项工作中，我们开发了一种直接对灵长类动物大脑活动进行对抗性视觉攻击的方法。然后，我们利用这种方法来证明上述信念可能不是很好的基础。具体而言，我们报告说，组成灵长类动物视觉系统的生物神经元表现出对对抗性扰动的敏感性，这些扰动与现有（训练有素的）人工神经网络相当。

translated by 谷歌翻译

Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

Joel Dapello , Jenelle Feather , Hang Le , Tiago Marques , David D. Cox , Josh H. McDermott , James J. DiCarlo , SueYeon Chung

分类：机器学习 | 神经与进化计算

2021-11-12

神经科学家和机器学习研究人员通常引用对抗的例子，作为计算模型如何从生物感官系统发散的示例。最近的工作已经提出将生物启发组件添加到视觉神经网络中，作为提高其对抗性鲁棒性的一种方式。一种令人惊讶的有效组分，用于减少对抗性脆弱性是响应随机性，例如由生物神经元呈现的响应性随机性。在这里，使用最近开发的从计算神经科学的几何技术，我们研究了对抗性扰动如何影响标准，前列培训和生物学启发的随机网络的内部表示。我们为每种类型的网络找到了不同的几何签名，揭示了实现稳健表示的不同机制。接下来，我们将这些结果概括为听觉域，表明神经插值性也使听觉模型对对抗对抗扰动更鲁棒。随机网络的几何分析揭示了清洁和离前动脉扰动刺激的表示之间的重叠，并且定量表现出随机性的竞争几何效果在对抗和清洁性能之间调解权衡。我们的结果阐明了通过对外内培训和随机网络利用的强大感知的策略，并帮助解释了随机性如何有利于机器和生物计算。

translated by 谷歌翻译

Combining Different V1 Brain Model Variants to Improve Robustness to Image Corruptions in CNNs

Avinash Baidya , Joel Dapello , James J. DiCarlo , Tiago Marques

分类：计算机视觉

2021-10-20

虽然一些卷积神经网络（CNNS）在对象分类中超过了人类的视觉能力，但它们通常努力识别以不同类型的常见噪声模式损坏的图像中的对象，突出了这一系列模型的主要限制。最近，已经表明，在CNNS前面模拟主视觉皮质（V1）导致对这些图像扰动的鲁棒性的小改进。在本研究中，我们从观察到v1模型的不同变体显示特定腐败类型的增益。然后，我们使用合奏技术构建一个新模型，该技术将多个单独模型与不同的V1前端变体组合。该模型集合利用每个腐败类别的鲁棒性的显着改善，平均优于38％的基础模型。最后，我们表明使用蒸馏，可以将集合模型中的知识部分压缩成具有V1前端的单个模型。虽然这里使用的合并和蒸馏技术几乎没有生物学，但是这里呈现的结果表明，通过组合V1中不同神经元电路的特定强度，可以改善CNN的鲁棒性，用于广泛的扰动。

translated by 谷歌翻译

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

Chuang Gan , Jeremy Schwartz , Seth Alter , Damian Mrowca , Martin Schrimpf , James Traer , Julian De Freitas , Jonas Kubilius , Abhishek Bhandwaldar , Nick Haber

分类：计算机视觉 | 机器学习 | 机器人

2020-07-09

我们介绍了ThreedWorld（TDW），是交互式多模态物理模拟的平台。 TDW能够模拟高保真感官数据和富裕的3D环境中的移动代理和对象之间的物理交互。独特的属性包括：实时近光 - 真实图像渲染;对象和环境库，以及他们定制的例程;有效构建新环境课程的生成程序;高保真音频渲染;各种材料类型的现实物理相互作用，包括布料，液体和可变形物体;可定制的代理体现AI代理商;并支持与VR设备的人类交互。 TDW的API使多个代理能够在模拟中进行交互，并返回一系列表示世界状态的传感器和物理数据。我们在计算机视觉，机器学习和认知科学中的新兴的研究方向上提供了通过TDW的初始实验，包括多模态物理场景理解，物理动态预测，多代理交互，像孩子一样学习的模型，并注意研究人类和神经网络。

translated by 谷歌翻译

BD-KD: Balancing the Divergences for Online Knowledge Distillation

Ibtihel Amara , Nazanin Sepahvand , Brett H. Meyer , Warren J. Gross , James J. Clark

分类：计算机视觉

2022-12-25

Knowledge distillation (KD) has gained a lot of attention in the field of model compression for edge devices thanks to its effectiveness in compressing large powerful networks into smaller lower-capacity models. Online distillation, in which both the teacher and the student are learning collaboratively, has also gained much interest due to its ability to improve on the performance of the networks involved. The Kullback-Leibler (KL) divergence ensures the proper knowledge transfer between the teacher and student. However, most online KD techniques present some bottlenecks under the network capacity gap. By cooperatively and simultaneously training, the models the KL distance becomes incapable of properly minimizing the teacher's and student's distributions. Alongside accuracy, critical edge device applications are in need of well-calibrated compact networks. Confidence calibration provides a sensible way of getting trustworthy predictions. We propose BD-KD: Balancing of Divergences for online Knowledge Distillation. We show that adaptively balancing between the reverse and forward divergences shifts the focus of the training strategy to the compact student network without limiting the teacher network's learning process. We demonstrate that, by performing this balancing design at the level of the student distillation loss, we improve upon both performance accuracy and calibration of the compact student network. We conducted extensive experiments using a variety of network architectures and show improvements on multiple datasets including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approach through comprehensive comparisons and ablations with current state-of-the-art online and offline KD techniques.

translated by 谷歌翻译

KronA: Parameter Efficient Tuning with Kronecker Adapter

Ali Edalati , Marzieh Tahaei , Ivan Kobyzev , Vahid Partovi Nia , James J. Clark , Mehdi Rezagholizadeh

分类：自然语言处理

2022-12-20

Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream task has been a well-known paradigm in Natural Language Processing. However, with the ever-growing size of PLMs, training the entire model on several downstream tasks becomes very expensive and resource-hungry. Recently, different Parameter Efficient Tuning (PET) techniques are proposed to improve the efficiency of fine-tuning PLMs. One popular category of PET methods is the low-rank adaptation methods which insert learnable truncated SVD modules into the original model either sequentially or in parallel. However, low-rank decomposition suffers from limited representation power. In this work, we address this problem using the Kronecker product instead of the low-rank representation. We introduce KronA, a Kronecker product-based adapter module for efficient fine-tuning of Transformer-based PLMs. We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.

translated by 谷歌翻译

Predicting Autonomous Vehicle Collision Injury Severity Levels for Ethical Decision Making and Path Planning

James E. Pickering , Keith J. Burnham

分类：人工智能

2022-12-16

Developments in autonomous vehicles (AVs) are rapidly advancing and will in the next 20 years become a central part to our society. However, especially in the early stages of deployment, there is expected to be incidents involving AVs. In the event of AV incidents, decisions will need to be made that require ethical decisions, e.g., deciding between colliding into a group of pedestrians or a rigid barrier. For an AV to undertake such ethical decision making and path planning, simulation models of the situation will be required that are used in real-time on-board the AV. These models will enable path planning and ethical decision making to be undertaken based on predetermined collision injury severity levels. In this research, models are developed for the path planning and ethical decision making that predetermine knowledge regarding the possible collision injury severities, i.e., peak deformation of the AV colliding into the rigid barrier or the impact velocity of the AV colliding into a pedestrian. Based on such knowledge and using fuzzy logic, a novel nonlinear weighted utility cost function for the collision injury severity levels is developed. This allows the model-based predicted collision outcomes arising from AV peak deformation and AV-pedestrian impact velocity to be examined separately via weighted utility cost functions with a common structure. The general form of the weighted utility cost function exploits a fuzzy sets approach, thus allowing common utility costs from the two separate utility cost functions to be meaningfully compared. A decision-making algorithm, which makes use of a utilitarian ethical approach, ensures that the AV will always steer onto the path which represents the lowest injury severity level, hence utility cost to society.

translated by 谷歌翻译

Foresight -- Deep Generative Modelling of Patient Timelines using Electronic Health Records

Zeljko Kraljevic , Dan Bean , Anthony Shek , Rebecca Bendayan , Joshua Au Yeung , Alexander Deng , Alfie Baston , Jack Ross , Esther Idowu , James T Teo

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-13

Electronic Health Records (EHRs) hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Temporal modelling of this medical history, which considers the sequence of events, can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications. While most prediction approaches use mainly structured data or a subset of single-domain forecasts and outcomes, we processed the entire free-text portion of EHRs for longitudinal modelling. We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events such as disorders, medications, symptoms and interventions. Since large portions of EHR data are in text form, such an approach benefits from a granular and detailed view of a patient while introducing modest additional noise. On tests in two large UK hospitals (King's College Hospital, South London and Maudsley) and the US MIMIC-III dataset precision@10 of 0.80, 0.81 and 0.91 was achieved for forecasting the next biomedical concept. Foresight was also validated on 34 synthetic patient timelines by 5 clinicians and achieved relevancy of 97% for the top forecasted candidate disorder. Foresight can be easily trained and deployed locally as it only requires free-text data (as a minimum). As a generative model, it can simulate follow-on disorders, medications and interventions for as many steps as required. Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk estimation, virtual trials and clinical research to study the progression of diseases, simulate interventions and counterfactuals, and for educational purposes.

translated by 谷歌翻译

System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games

Indranil Sur , Zachary Daniels , Abrar Rahman , Kamil Faber , Gianmarco J. Gallardo , Tyler L. Hayes , Cameron E. Taylor , Mustafa Burak Gurbuz , James Smith , Sahana Joshi

分类：机器学习 | 人工智能

2022-12-08

As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than developing individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.

translated by 谷歌翻译

Semantically Enhanced Global Reasoning for Semantic Segmentation

Mir Rayat Imtiaz Hossain , Leonid Sigal , James J. Little

分类：计算机视觉 | 机器学习

2022-12-06

Recent advances in pixel-level tasks (e.g., segmentation) illustrate the benefit of long-range interactions between aggregated region-based representations that can enhance local features. However, such pixel-to-region associations and the resulting representation, which often take the form of attention, cannot model the underlying semantic structure of the scene (e.g., individual objects and, by extension, their interactions). In this work, we take a step toward addressing this limitation. Specifically, we propose an architecture where we learn to project image features into latent region representations and perform global reasoning across them, using a transformer, to produce contextualized and scene-consistent representations that are then fused with original pixel-level features. Our design enables the latent regions to represent semantically meaningful concepts, by ensuring that activated regions are spatially disjoint and unions of such regions correspond to connected object segments. The resulting semantic global reasoning (SGR) is end-to-end trainable and can be combined with any semantic segmentation framework and backbone. Combining SGR with DeepLabV3 results in a semantic segmentation performance that is competitive to the state-of-the-art, while resulting in more semantically interpretable and diverse region representations, which we show can effectively transfer to detection and instance segmentation. Further, we propose a new metric that allows us to measure the semantics of representations at both the object class and instance level.

translated by 谷歌翻译